Bookmark
The Easy Way to Extract Useful Text from Arbitrary HTML - AI Depot
ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/, posted 2011 by peter in ai development nlp python scraping
This article shows you how to write a relatively simple script to extract text paragraphs from large chunks of HTML code, without knowing its structure or the tags used. It works on news articles and blogs pages with worthwhile text content, among others…